Reading Excel file that has the football players statistics. Creating a new text file and appending the

excel data into a text file.

Dataframe Dimensions

checking for missing values

As we see column named 'Unnamed: 0' is index number, we can remove it

Corelation between properties we have

based on correlation above we can say that goal keeper is having more variations when, compared to other players(non-goalkeepers)

top 5 countries in FIFA-19

Selecting only the relevant features required to analyze the data according to the Usecases.

UseCase1:Which club is likely to concede the fewest goals during a season?

Data Preprocessing - There are many null values in more than 20 columns and we chose to remove those records which are null because those columns cannot be filled by any techniques like mean,mode and median as the players skills score differes accordingly.

Removing the players who does not belongs to any clubs.

To solve the Use Case 1 we chose to sort the players according to the Defense Position and Defense Skill Scores. Then to sum up the clubs whi has high number of points in defense.

Creating the array of sum of all defensive skill scores and adding as a new column to a dataframe.

Similarly, adding all skill scores related to physical and mental in order to get the strongest clubs who has a good physical ability to play at defensive positions.

Adding Goal Keeper skill scores to a new array and assigning it to a dataframe column as these are required for the analysis of a good defense because goal keepers play decent role in defenfing scoring goals by opponents

Selecting the players who plays at defensive positions and storing it to new dataframe as we mainly concentate on the players who play at these positions.

Plotting a line graph of the top 10 players who has good overall score.

Plotting a line graph of the top 10 players who has good Defensive skills score.

Plotting a line graph of the top 10 players who has good Goalkeeping skills score.

Grouping by clubs to know the total scores of overall score, defensive skills score,fitness scores and goalkeeping skills score

Summing up defensive skill scores and goalkeeping skills score to get the total defensive points that clubs have.

Create figure - Bar plot of top 10 clubs who has good Defensive skills, fitness score and goal keeping scores,From the below plot we can conclude that the Leicester City Football club could conceed less goals.As the club leicester city has pretty good overall, goalkeeping, fitness and defense scores.

Plotting line graph of Manchester city club players overall and defensive skill scores to analyze how good the players are in those areas. From the plot we can say that Manchester City has almost 50% of the players who has good defensive skills and overall scores for example Fernandhino has over all score as 86 and defensive skills score as 385.

Plotting line graph of Manchester city club players overall and Goalkeeping skill scores to analyze how good the players are in those areas. From the plot we can say that Manchester City has 2 good players who has good Goalkeeping skills and overall scores for example Ederson has over all score as 86 and Goalkeeping skills score as 425.

Plotting line graph of Leicester city club players overall and Defensive skill scores to analyze how good the players are in those areas. From the plot we can say that Manchester City has good players who has good Defensive skills and overall scores for example Maguire has over all score as 82 and Defensive skills score as 353.

Plotting line graph of Leicester city club players overall and GoalKeeping skill scores to analyze how good the players are in those areas. From the plot we can say that Leicester City has good players who has good GoalKeeping skills and overall scores for example Schmeichel has over all score as 84 and GoalKeeping skills score as 409.

Plotting line graph of Everton club players overall and Defensive skill scores to analyze how good the players are in those areas. From the plot we can say that Everton has good players who has good Defensive skills and overall scores for example GUEYE has over all score as 83 and Defensive skills score as 332.

Plotting line graph of Everton club players overall and GoalKeeping skill scores to analyze how good the players are in those areas. From the plot we can say that Everton has good players who has good GoalKeeping skills and overall scores for example Pickford has over all score as 88 and GoalKeeping skills score as 416.

Conclusion: To conclude our usecase 1 to find the club which is most likely to conceed less goals, First we chose the skills which are responsible for the defense and those players are likely to stop opponents scoring goal. Then we filtered the players who plays at the defensive positions like CDM,GK etc. After filtering out the players, sorted the top 10 clubs having highest defensive skills, fitness scores and goalkeeping skills. We came to say that the football club Leicester city might have conceeded less goals compared to other clubs.

UseCase 2: What are the main differences between (1) players with the position “ST” (forward), (2) players with the position “CDM” (center defensive midfielder), and (3) players with the position “GK” (goalkeeper)?

Our approach is to classify the difference between the positions like cdm,gk and st. In order to do that, get the player general playing skill scores, fitness scores, defensive scores and the goal keeping scores into a features list. train the positions CDM,ST and GK with those features and apply k-means classifier to classify the difference.

Visualizing the confusion matrix , we can say that there is an 100% accuracy and f1 score that conveys that the classifier has predicted the positions correct according to the features passed for defense, striker, goal keeper.

Scatter 3D plot of players according to finishing, fitness score and goal keepers skills. Because, to differentiate the goak keeper from other positions like ST and CDM we need the goalkeeping skill scores as only the goal keeper have more points that ST and CDM. similarly, goalkeepers have less fitness score as they dont do any physical skillset like penalty shoots, sprint etc. To Differentiate the STriker position we have taken the finishing skill as strikers play forward towards opponents goal post scoring goal.

Scatter 3D plot of players according to Marking, Standing Tackle score and Sliding Tackle skills. Because, to differentiate the Central Defensive Midfielder from other positions like ST and GK we need the Defensive skill scores as only the CDM have more points that ST and GK.

From both these plots we can see that De Gea (GK) has good goal keeping skills GKDiving,GKkicking,GKReflexes and GKHandling but very few skills like Marking,crossing,sliding tackle,finishing which belongs to player with position forward(ST) and CDM.Even for both the players Casemiro(CDM) and Cristiano Ronaldo(ST) we can see them having good skills with respect to their position.

Conclusion- To find out the differences between the players of the position ST,CDM and GK we decided to use the features or characteristics of a player playing in different position for that We used KNeighborsClassifier where we got 100% accuracy and f1 score which conveyed that the classifier has predicted the positions correct according to the features passed for defense, striker and goal keeper.So,for the further analysis we decided to plot a radar chart for three different players in different positions from which we can see that De Gea (GK) has good goal keeping skills GKDiving,GKkicking,GKReflexes and GKHandling but very few skills like Marking,crossing,sliding tackle,finishing which belongs to player with position forward(ST) and CDM.Even for both the players Casemiro(CDM) and Cristiano Ronaldo(ST) we can see them having good skills with respect to their position

UseCase 3: What types of players with the position “ST” can be distinguished (e.g., “rather small and fast players”, “rather tall and strong players”, etc.)? What players are typical examples of these types?

Import Data

Filter out unnecessary columns for analysis

Converted Weight and height to float

Seperated Striker and non striker

Plot for the key difference between Strikers and non Strikers

Seperated more players according to position to compare against Strikers

Plot that represents the average Stamina and Strength of Strikers and non Strikers.

Blue represents strikers who in average have greater strength than average non strikers. Left Center Back in general tends to have greater strength than strikers.

Similarly Strikers in general tend to have less stamina than RB, LWB,RCM but is in par with average of non strikers.

Plot that represents the average Sprint and Accleration of Strikers and non Strikers.

Blue represents strikers who in average have greater strength than average non strikers(in orange). Left W Back, RB in general tends to have greater Sprint and Accleration than strikers.

Plot that represents the average Finishing and Dribbling skills of Strikers and non Strikers.

Blue represents strikers who in average have greater Finishing skills than average non strikers or any other positional players

But Strikers in general tend to have less Dribbling skills than Center Attaking Midfilder, Right Center Midfielder,Left Wing Back but better than average of non strikers.

When comparing with the top players Ronaldo, Lewandowski, Agüero they far outperform in both skills than average.

Plot that represents the average BallControl and Dribbling skills of Strikers and non Strikers.

Blue represents strikers who in average have greater Ball Control skills than average non strikers

But Strikers in general have approximately equal Free accuracy to than average non strikers.

Plot that represents the average Long Passing and Short Passing skills of Strikers and non Strikers.

Strikers who in average have less Long Passing skills than average non strikers

But Strikers in general have approximately equal ShortPassing skills to than average non strikers.

Plot that represents the average Long Passing and Short Passing skills of Strikers, non Strikers and few strikers in particular.

Top rated players Ronaldo and Lewandowski outperform average of all positional players in Short passing while Worman not.

Top rated Strikers in general have Long passing skills better than average of most positional average.

Creating Seperate dataframe to store Euclidean distance of each players qualities in different area to the mean of Strikers.

For example, The average dribbling score of all Strikers compared with Ronaldo.

Euclidean distance score gives the distance of the difference.

Selecting only the strikers and take the average of all Euclidean distance of these 5 features(LongPassing,BallControl,Finishing,Dribbling,Weight) .

More features can be added for more accuracy.

The players having the lowest average value can be considered as having features of a tipycal Striker.

Here Miguel, Rosseti, vergos can be distinguished as typical Strikers